PGS Educational Attainment

Data Preprocessing

  1. First, I wanted to investigate which variables are time dependent and also exclude some that were clearly unnecessary (i.e., “SITE”,“COLPROT”,“ORIGPROT”, “FLDSTRENG”,“FSVERSION”,“IMAGEUID”, “Month_bl”,“Month”,“M”,“update_stamp”).

  2. Merge time dependent and independent variables into the long_dat data frame. Also, I recoded the time points in the VISCODE variable into integers.

long_dat <- dat[, c(ivars[,1], nivars[,1])] %>%
  mutate(VISCODE = match(VISCODE, c("bl", "m03", "m06", "m12", "m18", "m24", 
                                    "m30","m36", "m42", "m48", "m54", "m60", 
                                    "m66", "m72","m78", "m84", "m90", "m96", 
                                    "m102", "m108","m114", "m120", "m126", 
                                    "m132", "m144", "m156"))-1) %>%
  relocate(RID, PTID, VISCODE) %>%
  arrange(RID, VISCODE)
  1. In the original data frame there were quite some _bl or _BL variables. Thus, I wanted to check whether these columns had already been integrated or not at each corresponding time point for each participant. Surprise, the test was negative.

  2. Therefore, I continued with merging the _bl/_BL variables with the corresponding time dependent variable for each participant. Additionally, I specified the data type of each variable individually for optimal control and oversight over the data structure.

  3. Transform Long to Wide Data Format

## # A tibble: 6 × 1,153
##   RID   PTID         AGE PTGENDER PTEDUCAT PTETHCAT PTRACCAT PTMARRY APOE4 FDG_0
##   <fct> <chr>      <dbl> <fct>       <int> <fct>    <fct>    <fct>   <int> <dbl>
## 1 2     011_S_0002  74.3 Male           16 Not His… White    Married     0  1.37
## 2 3     011_S_0003  81.3 Male           18 Not His… White    Married     1  1.08
## 3 4     022_S_0004  67.5 Male           10 Hisp/La… White    Married     0 NA   
## 4 5     011_S_0005  73.7 Male           16 Not His… White    Married     0  1.29
## 5 6     100_S_0006  80.4 Female         13 Not His… White    Married     0 NA   
## 6 7     022_S_0007  75.4 Male           10 Hisp/La… More th… Married     1 NA   
## # ℹ 1,143 more variables: FDG_2 <dbl>, FDG_7 <dbl>, FDG_11 <dbl>, FDG_12 <dbl>,
## #   FDG_13 <dbl>, FDG_14 <dbl>, FDG_15 <dbl>, FDG_16 <dbl>, FDG_17 <dbl>,
## #   FDG_18 <dbl>, FDG_19 <dbl>, FDG_21 <dbl>, FDG_22 <dbl>, FDG_23 <dbl>,
## #   FDG_24 <dbl>, FDG_3 <dbl>, FDG_4 <dbl>, FDG_5 <dbl>, FDG_6 <dbl>,
## #   FDG_9 <dbl>, FDG_8 <dbl>, FDG_10 <dbl>, FDG_25 <dbl>, FDG_20 <dbl>,
## #   FDG_1 <dbl>, PIB_0 <dbl>, PIB_2 <dbl>, PIB_7 <dbl>, PIB_11 <dbl>,
## #   PIB_12 <dbl>, PIB_13 <dbl>, PIB_14 <dbl>, PIB_15 <dbl>, PIB_16 <dbl>, …

Age Distribution in Data Frame

Attrition Analysis

Based on the number of participants measured at any time point I made a frequency plot to get a first idea of the sampling frequency.

Domains

Demographics

Cognitive Tests

Biomedical Imaging

Biomarkers

Based on these findings it appears that time point 9 is a cut-off where the number of measurements drop quite strongly. Time point 9 corresponds to month 42 (i.e., 3.5 years) of the follow-up.

Polygenic Score for Educational attainment

The merge(by.x, by.y) function creates a new data frame that only keeps those rows for which there is a matching key (in our case PTID). Therefore, we do have genetic data from 2 additional individuals for which we do not have any other measurements. The final data frame for which testing data and genetic data is available is thus, 1408 (N).

Plot PGS EA vs. Actual EA

Based on this plot, we can see a positive relationship between the polygenic score for education attainment and actual years of education. This means that with a higher PGS score comes higher genetic capacity for educational attainment.

We ran Pearson’s correlation which resulted in r = 0.286 (p-value < 2.2e-16)

Check linear regression assumptions

## 
## Call:
## lm(formula = PTEDUCAT ~ PTGENDER, data = wide_dat)
## 
## Coefficients:
##  (Intercept)  PTGENDERMale  
##       15.343         1.036

## 
## Call:
## lm(formula = EA22 ~ PTGENDER, data = wide_dat)
## 
## Coefficients:
##  (Intercept)  PTGENDERMale  
##      0.15332       0.03494

## 
## Female   Male 
##    597    811
## Model has interaction terms. VIFs might be inflated.
##   You may check multicollinearity among predictors of a model without
##   interaction terms.

Create Residuals

To get the residual we regressed the polygenic risk score for educational attainment against actual EA including the variables SEX & AGE as covariates. The results are depicted in the density plot.

How to interpret the Residuals?

It is important to correctly interpret the residual scores. The correct way to interpret them is, that a high residual score means that the individual has over-performed relative to his or her genetic capacity. See for example in this table for a short proof:

##   Actual Predicted  Residuals
## 1     18  17.07835  0.9216480
## 2     16  15.06331  0.9366934
## 3     12  16.66198 -4.6619756
## 4     20  15.87451  4.1254943
## 5     14  14.79803 -0.7980315
## 6     13  15.37074 -2.3707425

Survival Analysis

Using the ntile function from dplyr, the lower tertile will be assigned value 1 (~ negative residual), middle tertile value 2 and upper tertile value 3 (~positive residual). The time-point is limited to the 9th follow-up (i.e., 48 months).

Mini-Mental State Examination (MMSE)

“The mini–mental state examination (MMSE) is a 30-point questionnaire that is used extensively in clinical and research settings to measure cognitive impairment. It is commonly used in medicine and allied health to screen for dementia. It is also used to estimate the severity and progression of cognitive impairment and to follow the course of cognitive changes in an individual over time; thus making it an effective way to document an individual’s response to treatment.Administration of the test takes between 5 and 10 minutes and examines functions including registration (repeating named prompts), attention and calculation, recall, language, ability to follow simple commands and orientation. […] Any score of 24 or more (out of 30) indicates a normal cognition. Below this, scores can indicate severe (≤9 points), moderate (10–18 points) or mild (19-23 points) cognitive impairment.” (Wikipedia.org). The MMSE scores were normalized using the NormPsy package and then the cut-off was calculated.

## Linear mixed-effects model fit by REML
##   Data: long_dat 
##       AIC      BIC   logLik
##   89083.4 89149.12 -44532.7
## 
## Random effects:
##  Formula: ~1 | RID
##         (Intercept) Residual
## StdDev:    18.02488  11.6619
## 
## Fixed effects:  MMSE_norm ~ EA22 * VISCODE + AGE + PTGENDER + AGE * PTGENDER 
##                     Value Std.Error   DF    t-value p-value
## (Intercept)      98.50520  7.870223 9566  12.516189  0.0000
## EA22              4.34032  1.133075 1403   3.830570  0.0001
## VISCODE          -0.56170  0.030238 9566 -18.575872  0.0000
## AGE              -0.34389  0.107380 1403  -3.202574  0.0014
## PTGENDERMale      6.71428 10.527985 1403   0.637756  0.5237
## EA22:VISCODE      0.16246  0.057944 9566   2.803762  0.0051
## AGE:PTGENDERMale -0.10308  0.142123 1403  -0.725262  0.4684
##  Correlation: 
##                  (Intr) EA22   VISCOD AGE    PTGEND EA22:V
## EA22              0.037                                   
## VISCODE          -0.014  0.091                            
## AGE              -0.995 -0.060 -0.002                     
## PTGENDERMale     -0.746  0.000 -0.007  0.742              
## EA22:VISCODE      0.008 -0.229 -0.430 -0.001  0.000       
## AGE:PTGENDERMale  0.750 -0.003  0.006 -0.753 -0.995  0.000
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.63795179 -0.56880434  0.05285348  0.60949222  3.28585409 
## 
## Number of Observations: 10976
## Number of Groups: 1408
## Linear mixed-effects model fit by REML
##   Data: long_dat 
##        AIC      BIC    logLik
##   89022.23 89087.95 -44502.11
## 
## Random effects:
##  Formula: ~1 | RID
##         (Intercept) Residual
## StdDev:    17.63267 11.65521
## 
## Fixed effects:  MMSE_norm ~ PTEDUCAT * VISCODE + AGE + PTGENDER + AGE * PTGENDER 
##                     Value Std.Error   DF   t-value p-value
## (Intercept)      68.95471  8.438976 9566  8.170981  0.0000
## PTEDUCAT          1.42825  0.181115 1403  7.885861  0.0000
## VISCODE          -1.16534  0.157143 9566 -7.415779  0.0000
## AGE              -0.22913  0.105411 1403 -2.173722  0.0299
## PTGENDERMale      8.93915 10.317433 1403  0.866412  0.3864
## PTEDUCAT:VISCODE  0.03987  0.009617 9566  4.146364  0.0000
## AGE:PTGENDERMale -0.15558  0.139368 1403 -1.116343  0.2645
##  Correlation: 
##                  (Intr) PTEDUCAT VISCOD AGE    PTGEND PTEDUCAT:
## PTEDUCAT         -0.408                                        
## VISCODE          -0.078  0.223                                 
## AGE              -0.940  0.085   -0.002                        
## PTGENDERMale     -0.691  0.022   -0.004  0.742                 
## PTEDUCAT:VISCODE  0.078 -0.228   -0.985  0.001  0.002          
## AGE:PTGENDERMale  0.701 -0.040    0.004 -0.754 -0.995 -0.003   
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.60008766 -0.56922539  0.05326356  0.60800329  3.13349834 
## 
## Number of Observations: 10976
## Number of Groups: 1408
## Linear mixed-effects model fit by REML
##   Data: long_dat 
##        AIC     BIC    logLik
##   89045.97 89111.7 -44513.99
## 
## Random effects:
##  Formula: ~1 | RID
##         (Intercept) Residual
## StdDev:    17.75683 11.66101
## 
## Fixed effects:  MMSE_norm ~ res * VISCODE + AGE + PTGENDER + AGE * PTGENDER 
##                     Value Std.Error   DF    t-value p-value
## (Intercept)      96.78956  7.754924 9566  12.481046  0.0000
## res               3.56566  0.506156 1403   7.044589  0.0000
## VISCODE          -0.52404  0.027287 9566 -19.204418  0.0000
## AGE              -0.31197  0.105695 1403  -2.951615  0.0032
## PTGENDERMale      7.10497 10.382522 1403   0.684320  0.4939
## res:VISCODE       0.07652  0.027321 9566   2.800864  0.0051
## AGE:PTGENDERMale -0.10679  0.140160 1403  -0.761947  0.4462
##  Correlation: 
##                  (Intr) res    VISCOD AGE    PTGEND r:VISC
## res              -0.001                                   
## VISCODE          -0.011 -0.002                            
## AGE              -0.995  0.001 -0.004                     
## PTGENDERMale     -0.747  0.003 -0.008  0.743              
## res:VISCODE      -0.002 -0.235 -0.006  0.002  0.003       
## AGE:PTGENDERMale  0.750 -0.003  0.007 -0.754 -0.995 -0.003
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.61752013 -0.56722517  0.05201827  0.60954220  3.15234383 
## 
## Number of Observations: 10976
## Number of Groups: 1408
## Linear mixed-effects model fit by REML
##   Data: long_dat 
##        AIC   BIC    logLik
##   89081.67 89162 -44529.83
## 
## Random effects:
##  Formula: ~1 | RID
##         (Intercept) Residual
## StdDev:    18.06619 11.65521
## 
## Fixed effects:  MMSE_norm ~ thirtile_PGS * VISCODE + AGE + PTGENDER + AGE * PTGENDER 
##                          Value Std.Error   DF    t-value p-value
## (Intercept)           96.56898  7.896629 9565  12.229140  0.0000
## thirtile_PGS2          1.29247  1.259224 1402   1.026404  0.3049
## thirtile_PGS3          3.96055  1.262542 1402   3.136965  0.0017
## VISCODE               -0.51677  0.049391 9565 -10.462816  0.0000
## AGE                   -0.33065  0.107495 1402  -3.075954  0.0021
## PTGENDERMale           7.12880 10.550542 1402   0.675681  0.4994
## thirtile_PGS2:VISCODE -0.17609  0.069654 9565  -2.528024  0.0115
## thirtile_PGS3:VISCODE  0.11814  0.066075 9565   1.787955  0.0738
## AGE:PTGENDERMale      -0.10948  0.142432 1402  -0.768645  0.4422
##  Correlation: 
##                       (Intr) th_PGS2 th_PGS3 VISCOD AGE    PTGEND t_PGS2:
## thirtile_PGS2         -0.063                                             
## thirtile_PGS3         -0.037  0.501                                      
## VISCODE               -0.026  0.167   0.167                              
## AGE                   -0.991 -0.014  -0.039  -0.001                      
## PTGENDERMale          -0.745 -0.003   0.007  -0.003  0.742               
## thirtile_PGS2:VISCODE  0.020 -0.236  -0.118  -0.709 -0.001 -0.002        
## thirtile_PGS3:VISCODE  0.021 -0.125  -0.231  -0.747 -0.001 -0.001  0.530 
## AGE:PTGENDERMale       0.749 -0.001  -0.011   0.002 -0.753 -0.995  0.002 
##                       t_PGS3:
## thirtile_PGS2                
## thirtile_PGS3                
## VISCODE                      
## AGE                          
## PTGENDERMale                 
## thirtile_PGS2:VISCODE        
## thirtile_PGS3:VISCODE        
## AGE:PTGENDERMale       0.001 
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.54926774 -0.56527471  0.05175528  0.60878475  3.42199916 
## 
## Number of Observations: 10976
## Number of Groups: 1408
## Linear mixed-effects model fit by REML
##   Data: long_dat 
##        AIC      BIC    logLik
##   89025.41 89105.74 -44501.71
## 
## Random effects:
##  Formula: ~1 | RID
##         (Intercept) Residual
## StdDev:    17.68171 11.65507
## 
## Fixed effects:  MMSE_norm ~ thirtile_years * VISCODE + AGE + PTGENDER + AGE *      PTGENDER 
##                            Value Std.Error   DF    t-value p-value
## (Intercept)             87.54754  7.814931 9565  11.202598  0.0000
## thirtile_years2          5.21864  1.247938 1402   4.181807  0.0000
## thirtile_years3          9.38216  1.255561 1402   7.472484  0.0000
## VISCODE                 -0.68955  0.049026 9565 -14.064980  0.0000
## AGE                     -0.23965  0.105606 1402  -2.269312  0.0234
## PTGENDERMale             8.52758 10.343701 1402   0.824422  0.4098
## thirtile_years2:VISCODE  0.19553  0.067952 9565   2.877406  0.0040
## thirtile_years3:VISCODE  0.28217  0.067083 9565   4.206329  0.0000
## AGE:PTGENDERMale        -0.14786  0.139700 1402  -1.058439  0.2900
##  Correlation: 
##                         (Intr) thrt_2 thrt_3 VISCOD AGE    PTGEND t_2:VI t_3:VI
## thirtile_years2         -0.114                                                 
## thirtile_years3         -0.143  0.514                                          
## VISCODE                 -0.023  0.166  0.165                                   
## AGE                     -0.992  0.044  0.076 -0.003                            
## PTGENDERMale            -0.740  0.002  0.018 -0.007  0.742                     
## thirtile_years2:VISCODE  0.021 -0.235 -0.119 -0.721 -0.002  0.001              
## thirtile_years3:VISCODE  0.016 -0.121 -0.232 -0.731  0.004  0.004  0.527       
## AGE:PTGENDERMale         0.746 -0.016 -0.035  0.007 -0.754 -0.995 -0.001 -0.004
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.54939049 -0.56916774  0.05352309  0.60785536  3.17037774 
## 
## Number of Observations: 10976
## Number of Groups: 1408
## Linear mixed-effects model fit by REML
##   Data: long_dat 
##        AIC      BIC    logLik
##   89041.86 89122.19 -44509.93
## 
## Random effects:
##  Formula: ~1 | RID
##         (Intercept) Residual
## StdDev:     17.8093 11.65369
## 
## Fixed effects:  MMSE_norm ~ thirtile_res * VISCODE + AGE + PTGENDER + AGE * PTGENDER 
##                          Value Std.Error   DF    t-value p-value
## (Intercept)           93.62021  7.825546 9565  11.963409  0.0000
## thirtile_res2          3.33917  1.244753 1402   2.682596  0.0074
## thirtile_res3          7.72301  1.242139 1402   6.217505  0.0000
## VISCODE               -0.64608  0.047394 9565 -13.632024  0.0000
## AGE                   -0.31728  0.106079 1402  -2.990990  0.0028
## PTGENDERMale           7.05168 10.411359 1402   0.677307  0.4983
## thirtile_res2:VISCODE  0.06678  0.067842 9565   0.984283  0.3250
## thirtile_res3:VISCODE  0.28563  0.065963 9565   4.330229  0.0000
## AGE:PTGENDERMale      -0.10850  0.140564 1402  -0.771889  0.4403
##  Correlation: 
##                       (Intr) thrt_2 thrt_3 VISCOD AGE    PTGEND t_2:VI t_3:VI
## thirtile_res2         -0.110                                                 
## thirtile_res3         -0.075  0.500                                          
## VISCODE               -0.022  0.165  0.165                                   
## AGE                   -0.991  0.035 -0.002 -0.004                            
## PTGENDERMale          -0.743  0.013  0.004 -0.005  0.743                     
## thirtile_res2:VISCODE  0.017 -0.234 -0.115 -0.699  0.002 -0.002              
## thirtile_res3:VISCODE  0.016 -0.118 -0.235 -0.718  0.003  0.002  0.502       
## AGE:PTGENDERMale       0.748 -0.019 -0.007  0.004 -0.754 -0.995  0.001 -0.002
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.57416905 -0.56354505  0.05274324  0.60754720  3.13720332 
## 
## Number of Observations: 10976
## Number of Groups: 1408

Boxplots of MMSE by Age Group at Baseline

To see if it is necessary to stratify for age groups effect of polygenic risk score for EA and age group was tested using linear regression. The results are displayed below.

## 
## Call:
## lm(formula = MMSE ~ EA22 + Age_Group, data = long_dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -27.415  -1.189   1.109   2.458   3.473 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 27.12540    0.16657 162.844  < 2e-16 ***
## EA22         0.45636    0.11384   4.009 6.19e-05 ***
## Age_Group   -0.01018    0.06723  -0.151     0.88    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.642 on 4822 degrees of freedom
##   (6151 observations deleted due to missingness)
## Multiple R-squared:  0.003326,   Adjusted R-squared:  0.002912 
## F-statistic: 8.045 on 2 and 4822 DF,  p-value: 0.0003249
## 
## Call:
## lm(formula = MMSE_norm ~ EA22 + Age_Group, data = long_dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -77.457 -14.481   1.461  21.439  29.791 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  75.4231     0.9389  80.330  < 2e-16 ***
## EA22          3.7284     0.6416   5.811 6.62e-09 ***
## Age_Group    -0.1925     0.3790  -0.508    0.611    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.53 on 4822 degrees of freedom
##   (6151 observations deleted due to missingness)
## Multiple R-squared:  0.006955,   Adjusted R-squared:  0.006543 
## F-statistic: 16.89 on 2 and 4822 DF,  p-value: 4.923e-08

MMSE Survival Analysis

Next the survival analysis was conducted for the genetic capacity of educational attainment and the residual educational attainment.

## [1] 1.758909e-30
## [1] 3.680355e-05
## [1] 4.033274e-25

## [1] 4.487522e-13
## [1] 2.182679e-09
## [1] 5.456451e-05

Log Test Underachiever: 2.4786648^{-7}
Log Test Average Performance: 0.7581133
Log Test Overachiever:0.1303088

Alzheimer’s Disease Assessment Scale

The Cognitive Subscale Alzheimer’s Disease Assessment Scale (ADAS) is made of 11 tasks that include both subject-completed tests and observer-based assessments, assessing the memory, language, and praxis domains. The result is a global final score ranging from 0 to 70, based on the sum of the scores of the single tasks (ADAS11).

Beyond the ADAS11 score, the ADNI study included also an additional test of delayed word recall and a number cancellation or maze task, which are further summed to have a new total score that ranges from 0 to 85 (ADAS13).

In addition, the score of the task 4 (Word Recognition, ADASQ4) was included in the ADNIMERGE dataset.

(Grassi et al., 2019)

ADAS11

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), ADAS11_cut) ~ thirtile_res, 
##     data = .)
## 
## n=5973, 5 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 2938     1238     1028      42.9      92.9
## thirtile_res=3 3035      875     1085      40.6      92.9
## 
##  Chisq= 92.9  on 1 degrees of freedom, p= <2e-16

ADAS13

“The ADAS13 was included as a global measure of cognitive function. ADAS13 is a test battery developed to assess severity of cognitive impairment associated with AD and includes subtests and clinical evaluations assessing memory function, reasoning, language function, orientation and praxis. The ADAS13 is a modified version of the original ADAS-Cog-11, adding a cancellation task and a delayed free recall task. The higher the scores, the more severe impairment of cognitive function.” (Mofrad et al., 2021)

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), ADAS13_cut) ~ thirtile_res, 
##     data = .)
## 
## n=7358, 27 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 3607     1602     1334      54.0       114
## thirtile_res=3 3751     1122     1390      51.8       114
## 
##  Chisq= 114  on 1 degrees of freedom, p= <2e-16

ADASQ4

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), ADASQ4_cut) ~ thirtile_res, 
##     data = .)
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 3617     1478     1259      38.2      79.6
## thirtile_res=3 3768     1101     1320      36.4      79.6
## 
##  Chisq= 79.6  on 1 degrees of freedom, p= <2e-16

CDRSB

“The clinical dementia rating (CDR) scale is commonly used to diagnose dementia due to Alzheimer’s disease (AD). The sum of boxes of the CDR (CDR-SB) has recently been emphasized and applied to interventional trials for tracing the progression of cognitive impairment (CI) in the early stages of AD.” (Tzeng et al., 2022)

See Table 3 for explanation on the staging category (O’Bryant et al., 2012)

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), CDRSB_cut) ~ thirtile_res, 
##     data = .)
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 3617     2748     2528      19.2      42.7
## thirtile_res=3 3768     2433     2653      18.3      42.7
## 
##  Chisq= 42.7  on 1 degrees of freedom, p= 6e-11

DIGITSCORE

“The DSST (Digit Symbol Substitution Test) is a paper-and-pencil cognitive test presented on a single sheet of paper that requires a subject to match symbols to numbers according to a key located on the top of the page. The subject copies the symbol into spaces below a row of numbers. The number of correct symbols within the allowed time, usually 90 to 120 seconds, constitutes the score.” (Jaeger, 2018) The lower the scores, the more severe impairment of cognitive function.

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), DIGITSCOR_cut) ~ 
##     thirtile_res, data = .)
## 
## n=4095, 3290 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 2137      448      358      22.6      47.6
## thirtile_res=3 1958      253      343      23.6      47.6
## 
##  Chisq= 47.6  on 1 degrees of freedom, p= 5e-12

FAQ

The Functional Activities Questionnaire is used to assess an individual’s functional abilities in daily living activities. It is a caregiver-based questionnaire that helps evaluate how well a person is able to perform various instrumental activities of daily living (IADLs) and basic activities of daily living (ADLs). (ChatGPT) Sum scores (range 0-30). The score range for each item is 0–3 (higher scores indicate greater impairment; 0 = normal or never did but could do now; 1 = has difficulty but does by self or never did but would have difficulty now; 2 = requires assistance; 3 = dependent). There is no established cut-off score for IADL impairment on the FAQ. However, one study reported that a total FAQ score (sum of all 10 item scores; range 0–30) of ≥ 6 is suggestive of functional impairment [ 20]. (Marshall et al., 2015)

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), FAQ_cut) ~ thirtile_res, 
##     data = .)
## 
## n=7382, 3 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 3617     1249     1060      33.7      69.8
## thirtile_res=3 3765      923     1112      32.1      69.8
## 
##  Chisq= 69.8  on 1 degrees of freedom, p= <2e-16

LDELTOTAL

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), LDELTOTAL_cut) ~ 
##     thirtile_res, data = .)
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 3617     1992     1557       122       257
## thirtile_res=3 3768     1198     1633       116       257
## 
##  Chisq= 257  on 1 degrees of freedom, p= <2e-16

MOCA

Reference literature: doi: 10.1111/j.1532-5415.2005.53221.x

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), MOCA_cut) ~ thirtile_res, 
##     data = .)
## 
## n=3915, 3470 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1792     1322     1256      3.47      7.66
## thirtile_res=3 2123     1336     1402      3.11      7.66
## 
##  Chisq= 7.7  on 1 degrees of freedom, p= 0.006

Rey-Auditory Verbal Learning Test (RAVLT)

The RAVLT was included as a measure of memory function. In this test, the participants are asked to recall words from a list of 15 nouns immediately after each of five learning trials and after a short and a long delay. Two measures known to be sensitive to cognitive changes in patients with AD were included in the present study: Immediate recall (RAVLT-Im): the number of correct responses across the immediate recall of the five learning trials; percent forgetting (RAVLT-PF): the score on the fifth learning trial minus the score on the long delayed recall, divided by the score obtained on the fifth learning trial. The lower the scores, the more severe impairment of cognitive function.

Different summary scores are derived from raw RAVLT scores. These include RAVLT Immediate (the sum of scores from 5 first trials (Trials 1 to 5)), RAVLT Learning (the score of Trial 5 minus the score of Trial 1), RAVLT Forgetting (the score of Trial 5 minus score of the delayed recall) and RAVLT Percent Forgetting (RAVLT Forgetting divided by the score of Trial 5). We use naming of the ADNI merge table3 for these summary measures. We investigated the relationship between MRI measures and RAVLT cognitive test scores by estimating the RAVLT Immediate and RAVLT Percent Forgetting from the gray matter density. These two summary scores were selected since they highlight different aspects of episodic memory, learning (RAVLT Immediate) and delayed memory (RAVLT Percent forgetting), essential to AD and previous studies (Estévez-González et al., 2003, Wang et al., 2011, Gomar et al., 2014, Moradi et al., 2015) have indicated strong relationships between these two RAVLT measures and Alzheimer’s disease.

RAVLT Immediate

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), RAVLT_immediate_cut) ~ 
##     thirtile_res, data = .)
## 
## n=7367, 18 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 3609     2579     2772      13.4      30.3
## thirtile_res=3 3758     3095     2902      12.8      30.3
## 
##  Chisq= 30.3  on 1 degrees of freedom, p= 4e-08

RAVLT Percentage Forgetting

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), RAVLT_perc_forgetting_cut) ~ 
##     thirtile_res, data = .)
## 
## n=7358, 27 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 3600     1201     1057      19.6      40.5
## thirtile_res=3 3758      966     1110      18.7      40.5
## 
##  Chisq= 40.5  on 1 degrees of freedom, p= 2e-10

RAVLT Forgetting

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), RAVLT_forgetting_cut) ~ 
##     thirtile_res, data = .)
## 
## n=7367, 18 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 3609       67     81.5      2.57      5.04
## thirtile_res=3 3758      100     85.5      2.45      5.04
## 
##  Chisq= 5  on 1 degrees of freedom, p= 0.02

RAVLT Learning

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), RAVLT_learning_cut) ~ 
##     thirtile_res, data = .)
## 
## n=7367, 18 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 3609       41     44.3     0.248     0.484
## thirtile_res=3 3758       50     46.7     0.235     0.484
## 
##  Chisq= 0.5  on 1 degrees of freedom, p= 0.5

TRABSCORE

The Trail Making Test is a neuropsychological test of visual attention and task switching. It has two parts, in which the subject is instructed to connect a set of 25 dots as quickly as possible while maintaining accuracy.

The test can provide information about visual search speed, scanning, speed of processing, mental flexibility, and executive functioning. It is sensitive to cognitive impairment associated with dementia, including Alzheimer’s disease. (ChatGPT)

Record the total number of seconds to complete Part B (Trails B), up to a maximum of 300 seconds. If the participant is not finished by 300 seconds, the score is 300.

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), TRABSCOR_cut) ~ 
##     thirtile_res, data = .)
## 
## n=7322, 63 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 3579      915      733      45.4        92
## thirtile_res=3 3743      589      771      43.1        92
## 
##  Chisq= 92  on 1 degrees of freedom, p= <2e-16

Patient’s Everyday Cognition (EcogPt)

The original version of the ECog is an informant-based measure of cognitively-relevant everyday abilities comprised of 39 items, covering six cognitively-relevant domains: Everyday Memory, Everyday Language, Everyday Visuospatial Abilities, and Everyday Planning, Everyday Organization, and Everyday Divided Attention. Ratings are made on a four-point scale: 1 = better or no change compared to 10 years earlier, 2 = questionable/occasionally worse, 3 = consistently a little worse, 4 = consistently much worse. (Tomaszewski Farias et al., 2012)

EcogPt Everyday Divided Attention

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogPtDivatt_cut) ~ 
##     thirtile_res, data = .)
## 
## n=3911, 3474 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1793      189      213      2.78      5.34
## thirtile_res=3 2118      268      244      2.44      5.34
## 
##  Chisq= 5.3  on 1 degrees of freedom, p= 0.02

EcogPt Everyday Language

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogPtLang_cut) ~ 
##     thirtile_res, data = .)
## 
## n=3943, 3442 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1803      310      266      7.40      14.3
## thirtile_res=3 2140      259      303      6.48      14.3
## 
##  Chisq= 14.3  on 1 degrees of freedom, p= 2e-04

EcogPt Everyday Memory

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogPtMem_cut) ~ 
##     thirtile_res, data = .)
## 
## n=3948, 3437 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1801      348      341     0.126     0.245
## thirtile_res=3 2147      386      393     0.110     0.245
## 
##  Chisq= 0.2  on 1 degrees of freedom, p= 0.6

EcogPt Everyday Organization

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogPtOrgan_cut) ~ 
##     thirtile_res, data = .)
## 
## n=3878, 3507 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1760      300      310     0.323     0.625
## thirtile_res=3 2118      369      359     0.279     0.625
## 
##  Chisq= 0.6  on 1 degrees of freedom, p= 0.4

EcogPt Everyday Planning

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogPtPlan_cut) ~ 
##     thirtile_res, data = .)
## 
## n=3939, 3446 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1801      439      410      2.10      4.12
## thirtile_res=3 2138      439      468      1.84      4.12
## 
##  Chisq= 4.1  on 1 degrees of freedom, p= 0.04

EcogPt Everyday Visuospatial Abilities

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogPtVisspat_cut) ~ 
##     thirtile_res, data = .)
## 
## n=3921, 3464 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1800      356      330      2.00      3.93
## thirtile_res=3 2121      347      373      1.77      3.93
## 
##  Chisq= 3.9  on 1 degrees of freedom, p= 0.05

EcogPt Total ???

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogPtTotal_cut) ~ 
##     thirtile_res, data = .)
## 
## n=3943, 3442 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1803      368      346      1.44       2.8
## thirtile_res=3 2140      373      395      1.26       2.8
## 
##  Chisq= 2.8  on 1 degrees of freedom, p= 0.09

Self-Reported Everyday Cognitive Abilities Questionnaire (EcogSP)

EcogSP Everyday Divided Attention

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogSPDivatt_cut) ~ 
##     thirtile_res, data = .)
## 
## n=3936, 3449 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1817      618      559      6.22      12.7
## thirtile_res=3 2119      559      618      5.63      12.7
## 
##  Chisq= 12.7  on 1 degrees of freedom, p= 4e-04

EcogSP Everyday Language

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogSPLang_cut) ~ 
##     thirtile_res, data = .)
## 
## n=4011, 3374 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1845      767      696      7.23        15
## thirtile_res=3 2166      698      769      6.55        15
## 
##  Chisq= 15  on 1 degrees of freedom, p= 1e-04

EcogSP Everyday Memory

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogSPMem_cut) ~ 
##     thirtile_res, data = .)
## 
## n=4011, 3374 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1845      755      695      5.21      10.7
## thirtile_res=3 2166      708      768      4.72      10.7
## 
##  Chisq= 10.7  on 1 degrees of freedom, p= 0.001

EcogSP Everyday Organization

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogSPOrgan_cut) ~ 
##     thirtile_res, data = .)
## 
## n=3867, 3518 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1764      522      495      1.47      2.97
## thirtile_res=3 2103      529      556      1.31      2.97
## 
##  Chisq= 3  on 1 degrees of freedom, p= 0.08

EcogSP Everyday Planning

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogSPPlan_cut) ~ 
##     thirtile_res, data = .)
## 
## n=3981, 3404 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1828      694      615     10.09      20.8
## thirtile_res=3 2153      603      682      9.11      20.8
## 
##  Chisq= 20.8  on 1 degrees of freedom, p= 5e-06

EcogSP Everyday Visuospatial Abilities

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogSPVisspat_cut) ~ 
##     thirtile_res, data = .)
## 
## n=3973, 3412 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1818      687      606     10.78        22
## thirtile_res=3 2155      605      686      9.53        22
## 
##  Chisq= 22  on 1 degrees of freedom, p= 3e-06

EcogSP Total ???

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogSPTotal_cut) ~ 
##     thirtile_res, data = .)
## 
## n=4003, 3382 observations deleted due to missingness.
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1844      790      713      8.29      17.2
## thirtile_res=3 2159      709      786      7.52      17.2
## 
##  Chisq= 17.2  on 1 degrees of freedom, p= 3e-05

test <- long_dat %>% filter(AGE %in% 60:70)

# also test # people 65 to 75